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1.  Introduction 


Monitoring  a  comprehensive  test  ban  treaty  involves  the  difficult  problem  of  differentiating 
the  seismic  signal  of  nuclear  events  from  the  overwhelming  amount  of  seismic  signals  from 
earthquakes,  mining  explosions,  etc.  This  problem  is  made  even  more  difficult  due  to  the 
lack  of  information  concerning  the  behavior  of  nuclear  signals.  Furthermore,  in  many 
regions,  little  information  is  available  on  the  background  events. 

Wang,  et.  al.  (1996)  frame  the  problem  of  detecting  nuclear  events  in  terms  of  detecting 
outliers  (nuclear  events)  from  a  mixture  population  (earthquakes,  mining  explosions,  etc.). 
The  authors  develop  a  modified  likelihood  ratio  test  that  requires  no  distributional 
assumptions  concerning  the  outlier  distribution,  which  is  a  powerful  practical  solution  to 
the  lack  of  well  established  training  samples  for  nuclear  events.  Using  the  bootstrap  to 
model  the  distribution  of  the  test  statistic  and  calculate  critical  values,  the  authors  show 
this  modified  test  is  as  powerful  as  the  standard  likelihood  test  in  which  complete 
information  concerning  the  distribution  of  the  outlier  population  is  known.  However,  the 
training  data  used  to  model  the  mixture  distribution  was  assumed  to  be  "pure",  meaning 
that  information  concerning  the  source  of  the  events  in  the  training  data  is  assumed  to  be 
known.  A  brief  summary  of  the  test  statistic  and  the  bootstrap  procedure  is  given  in 
Appendix  B. 

In  this  report,  the  problem  of  detecting  the  rare  seismic  signal  from  events  in  new  or 
relatively  unexplored  regions  is  studied.  Extending  the  work  of  Wang,  et.  al.  (1996),  a  two 
stage  procedure  is  introduced  that  first  examines  a  potential  training  sample  from  a 
previously  unexplored  region  for  potential  outliers,  readying  the  training  sample  for  testing 
new  data.  Second,  additional  information  is  incorporated  in  an  attempt  to  identify  the 
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sources  of  the  events  in  the  training  sample  for  research  purposes.  However,  we  should 
stress  in  this  approach  introduced  here,  it  is  not  necessary  to  be  able  to  label  events.  The 
only  assumption  made  regarding  the  nature  of  previous  events  is  that  none  of  them  are 
nuclear  events. 

2.  The  Procedure 

In  this  section,  the  procedure  for  screening  data  about  which  little  information  about  the 
source  of  the  events  is  known,  such  as  data  collected  from  a  new  region  of  interest,  is 
developed.  The  data  are  assumed  to  be  uncontaminated,  i.e.  containing  no  nuclear  events, 
and  are  known  to  contain  a  fixed  number  of  groups.  These  groups  represent  the  types  of 
non-nuclear  seismic  activity  in  a  region.  Examples  include  earthquakes  and  mining  blasts. 
The  data  can  include  any  number  of  characteristics  calculated  from  the  seismic  signals.  A 
flowchart  of  the  procedure  is  given  in  Figure  1. 

The  first  step  involves  clustering  the  data  using  the  method  of  hierarchical  clustering 
discussed  in  Appendix  A.  After  the  clustering  stage,  initial  estimates  of  the  parameter 
values  are  calculated  from  the  estimated  group  membership.  It  is  possible  that  in  certain 
situations,  particularly  when  the  groups  in  the  training  sample  overlap,  the  clustering 
algorithm  will  return  one  or  more  clusters  that  are  considerably  smaller  than  is  reasonable. 
In  such  cases,  these  small  clusters  are  temporarily  set  aside,  and  the  remaining  data  is  used 
to  form  clusters  and  estimate  initial  parameters. 

Now,  each  point  is  considered  individually  by  the  using  the  other  n  —  1  points  as  a  pseudo 
training  sample.  The  modified  likelihood  test  developed  in  Wang,  et.  al.  (1996)  is  used  to 
test  each  point  and  determine  the  probability  that  each  point  belongs  to  the  assumed 
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Figure  1 
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mixture  population.  Any  point  with  a  significant  result  (probability  of  inclusion  in  the 
mixture  is  small)  at  this  phase  is  labeled  an  outlier  and  removed  from  the  future  training 
data.  The  other  points  tested  as  belonging  to  the  mixture  can  now  be  used  as  a  "clean" 
training  data  set  for  testing  inclusion  of  future  events  in  the  mixture. 


2.1  Example 

The  data  for  this  example  are  the  result  of  an  analysis  of  earthquakes  and  mining 
explosions  from  the  Vogtland  region  near  the  Czech-German  border  done  by  Relu  Burlacu 
and  Eugene  Herrin  from  the  Department  of  Geological  Sciences  at  the  Southern 
Methodist  University.  These  data  were  taken  from  the  ground  truth  database  put  together 
by  Grant,  et.  al.  (1993).  The  procedure  used  to  generate  the  measurements  is  new  to  the 
seismic  community  and  involves  fitting  a  low-order  (usually  third)  autoregressive  process 
to  the  Lg-phase  of  the  waveform.  The  power  spectral  density  is  estimated  and  the  strength 
and  frequencies  of  the  real  and  complex  poles  are  calculated.  Burlacu  and  Herrin  report 
that  a  characteristic  pattern  appears,  namely  that  distributed  surface  explosions  tend  to  be 
lower  frequency  with  a  sharper  spectrum  (strong  pole)  and  that  earthquakes  tend  to  have 
higher  frequencies  and  a  more  distributed  spectrum  (weak  pole).  These  features  are 
incorporated  into  a  promising  screening  process  to  identify  mining  blasts.  Here,  the 
complex  frequency  and  pole  strength  are  used  to  demonstrate  this  new  algorithm. 

Table  1  contains  information  on  the  events  used  in  this  study  and  Figure  2  shows  a 
scatterplot  of  the  complex  frequency  and  pole  for  each  event  (plotting  characters  indicate 
event  number).  Note  that  event  number  25  is  listed  in  the  ground  truth  data  base  as  an 
explosion,  although  some  controversy  has  surrounded  this  event.  For  this  example,  the 
ground  truth  information  is  not  used.  Rather,  the  source  for  each  event  is  assumed  to  be 
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Table  1 


Event  # 

Date 

Lat(N) 

Lonq(E) 

Or.  time 

Q/X 

1 

KMfEUi 

mssssM 

A 

2 

032191 

50.207 

12.685 

0 

2.05 

3.982 

12:04:15 

X 

3 

032291 

50.207 

12.685 

0 

2.03 

2,835 

12:3355 

X 

4 

032391 

50.207 

12.685 

0 

1.99 

2,025 

12:00:56 

X 

5 

032491 

50.296 

12.225 

12.9 

2.18 

- 

05:05:04 

Q 

6 

032491 

50.279 

12.228 

12.9 

1.50 

- 

05:3551 

Q 

7 

032491 

50.277 

12.240 

13.9 

1.40 

- 

06:57:59 

Q 

8 

032491 

50.278 

12.220 

12.4 

1.65 

- 

09:38:33 

Q 

9 

032491 

50.294 

12.223 

12.7 

2.07 

- 

14:33:28 

Q 

10 

032491 

50.293 

12.224 

12.5 

1.80 

- 

15:00:45 

Q 

11 

032491 

50.293 

12.224 

9 

1.73 

- 

15:41:04 

Q 

12 

032591 

50.298 

12.222 

12.9 

2.37 

- 

14:54:14 

Q 

13 

032591 

50.292 

12.213 

12.4 

1.54 

- 

22:31:46 

Q 

15 

050291 

50.207 

12.713 

0 

1.93 

3,575 

11:06:10 

X 

19 

051991 

50.360 

12.371 

0 

2.06 

- 

03:22:10 

Q 

20 

052391 

50.207 

12.713 

0 

2.12 

3,135 

11:01:05 

X 

21 

052591 

50.207 

12.713 

0 

2.13 

3,135 

11:01:29 

X 

23 

052891 

50.207 

12.685 

0 

2.01 

3,575 

11:03:51 

X 

24 

062091 

50.207 

12.685 

0 

1.98 

1,998 

11:01:17 

X 

25 

062091 

50.293 

12.803 

0 

1.80 

- 

11:45:35 

X 

26 

062291 

50.207 

12.685 

0 

2.15 

2,886 

10:58:34 

X 

27 

062791 

50.207 

12.685 

0 

1.93 

3,515 

11:04:40 

X 

5 


6-0  S’O  I'O  9*0  9*0  t^*0 
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unknown,  although  it  is  assumed  that  the  data  set  is  composed  of  observations  from  two 
sources  (earthquake  and  mining  blast).  Finally,  it  is  also  assumed  that  no  nuclear  events 
are  present  in  the  sample. 

Figure  3  shows  the  result  of  the  cluster  analysis.  The  members  of  each  cluster  are 
indicated  on  the  plot  (as  a  "1"  or  a  "2")  as  well  as  a  95%  contour  for  each  component 
normal  distribution  using  the  parameters  estimated  from  the  results  of  the  cluster  analysis. 
Note  that  the  labeling  of  clusters  is  arbitrary  and  does  not  indicate  the  source  of  the  event. 
These  data  show  a  clear  separation  between  the  groups.  Hence  only  one  iteration  of  the 
cluster  analysis  is  necessary. 


Figure  4  shows  the  results  of  the  leave-one-out  testing  procedure.  Plotted  are  the  p-values 
for  being  in  the  mixture  associated  with  each  frequency  and  pole  pair  (plotting  characters 
indicate  p-value).  Note  that  only  event  25  shows  a  significant  result  (p-value  =  0.01), 
which  leads  to  the  conclusion  that  event  25  is  an  outlier  to  the  mixture  distribution  of 
earthquakes  and  explosions.  Results  for  all  other  points  support  their  membership  in  the 
mixture  and  are  consistent  with  the  ground  truth  information. 

New  data  values  in  this  region  could  now  be  tested  using  this  '  clean  data.  In  fact.  Figure 
5  shows  contours  representing  effective  rejection  regions  (q  =  0.1,  0.05,  0.01)  based  on 
this  training  sample.  Note  that  these  regions  mirror  the  shape  of  the  distributions 
suggested  by  the  data. 
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Figure  3:  Results  of  Cluster 
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Appendix  A:  Hierarchical  Clustering 


Clustering  is  the  process  of  grouping  similar  objects  on  the  basis  of  characteristics  of  the 
objects.  The  process  is  also  referred  to  as  classification  or  pattern  recognition.  For  a 
general  treatment  of  the  subject,  see,  for  example,  Seber  (1984)  or  any  textbook  on 
multivariate  statistics. 


Two  basic  types  of  clustering  algorithms  exist.  The  first  is  hierarchical  clustering,  which  is 
an  iterative  technique  involving  the  grouping  of  smaller  clusters  into  larger  ones  until  the 
desired  number  of  clusters  has  been  achieved.  The  second  type  partitions  objects  into  non¬ 
overlapping  groups  by  setting  the  number  of  clusters,  choosing  initial  locations  of  the 
clusters,  and  then  assigning  points  to  one  of  the  groups  according  to  some  pre-specified 
criterion.  In  this  work,  a  two-stage  approach  to  clustering  is  taken.  First,  a  hierarchial 
approach  is  used  to  obtain  initial  parameter  estimates  of  the  clustering.  Then,  in  some 
cases,  a  procedure  similar  in  nature  to  the  fc-means  approach  of  Hartigan  (1975)  is  used  to 
refine  the  parameter  estimates  if  necessary. 

The  clustering  algorithm  begins  by  considering  each  of  the  n  data  pomts  as  an  individual 
cluster.  Then,  the  two  points  nearest  to  each  other  are  combined  to  form  n  -  1  clusters. 
The  procedure  continues  by  combining  or  fusing  the  two  clusters  that  are  the  most  similar 
at  each  iteration.  Here,  similarity  is  a  distance  measure  that  is  calculated  in  a  variety  of 
ways.  The  three  most  common  measures  of  similarity  are  single  linkage  (nearest  neighbor), 
complete  linkage  (farthest  neighbor),  or  the  nearest  centroid  measure.  For  single  linkage, 
the  similarity  between  two  clusters  is  measured  as  the  smallest  distance  between  a  point  of 
one  cluster  and  a  point  of  another.  On  the  other  hand,  complete  linkage  measures 
similarity  between  two  clusters  as  the  largest  distance  between  a  point  of  one  cluster  and  a 
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point  of  another.  The  centroid  method  measures  similarity  as  the  distance  between  the 
centroids  or  means  of  the  points  in  each  cluster. 

Both  the  single  and  complete  linkages  suffer  from  the  extreme  nature  of  the  measure  of 
similarity.  Single  linkage  causes  clusters  formed  by  a  single  individual  to  be  more  likely  to 
join  another  established  cluster  rather  than  forming  the  nucleus  of  a  new  cluster.  This 
phenomenon,  known  as  "chaining,"  leads  to  clusters  that  are  formed  of  smaller  groups 
linked  together  by  intermediates.  Complete  linkage  yields  the  opposite  effect  as  individuals 
are  more  likely  to  form  a  new  cluster  than  join  established  ones.  This  has  the  potential  to 
yield  clusters  that  are  formed  of  points  that  do  not  really  belong  together.  The  centroid 
method  is  more  robust  than  either  of  these  two  measures  and  is  less  subject  to  these 
problems  in  the  formation  of  clusters.  Hence,  the  centroid  method  is  the  measure  adopted 

in  this  work. 
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Appendix  B 


A  New  Test  for  Outlier  Detection  from 
a  Multivariate  Mixture  Distribution 


Suojin  Wang,  Wayne  A.  Woodward,  H.L.  Gray,  Stephen 


Wiechecki,  and  Stephan  Sain 


ABSTRACT 

The  problem  of  testing  an  outlier  from  a  multivariate  mixture  distribution  of  several 
populations  has  many  important  appHcations  in  practice.  One  particular  example  is  m 
monitoring  worldwide  nuclear  testing,  where  we  wish  to  detect  whether  an  observed  event  is 
possibly  a  nuclear  explosion  (an  outlier)  by  comparing  it  with  the  training  samples  from  mining 
blasts  and  earthquakes.  The  combined  population  of  seismic  events  from  minmg  blasts  and 
earthquakes  can  be  viewed  as  a  mixture  of  two  populations.  The  classical  likelihood  ratio  test 
appears  to  be  not  applicable  to  our  problem,  and  in  spite  of  the  importance  of  this  problem, 
little  progress  has  been  made  in  the  literature.  In  this  report  we  propose  a  simple  modified 
likelihood  ratio  test  that  overcomes  the  difficulties  in  the  current  problem.  Bootstrap 
techniques  are  used  to  approximate  the  distribution  of  the  test  statistic.  The  advantages  of  the 
new  test  are  demonstrated  via  simulation  studies.  Some  new  computational  findings  are  also 

reported. 
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C-0098  NSF  Grant  DMS-9504589,  an  NSA  grant  and  Texas  A&M  Scholarly  and  Creative 
Activitik  Program  95-59.  We  would  like  to  acknowledge  Steve  Sam  for  producmg  the 
graphics  for  the  figures. 
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1.  Introduction 

An  extremely  important  practical  problem  is  that  of  monitoring  worldwide  nuclear  testing, 
where  we  wish  to  detect  whether  an  observed  seismic  event  may  be  a  nuclear  explosion  by 
comparing  it  with  the  training  samples  obtained  from  previous  seismic  activity  in  the  region.  In 
this  case,  the  training  data  will  often  be  composed  of  data  which  are  a  composite  of  mining 
explosions  and  earthquakes.  Usual  methods  of  outlier  detection  typically  focus  on  the  setting  in 
which  observations  are  tested  as  outliers  from  a  single  population.  However,  in  the  case 
considered  here,  there  are  two  populations,  and  we  wish  to  test  whether  a  seismic  event  should  be 
considered  to  be  an  outlier  from  either  or  both  of  the  populations.  Actually,  these  results  are 
applicable  to  two  or  more  populations  but  we  focus  on  the  case  of  two.  Another  point  of  interest 
is  the  fact  that  the  setting  considered  here  differs  from  a  common  outlier  scenario  in  which  a 
sample  is  given  and  the  observations  from  the  sample  are  tested  to  determine  whether  they  should 
be  considered  as  outliers  from  the  population  from  which  the  sample  was  obtained.  This, 
however,  is  not  the  scenario  considered  here.  Specifically,  in  our  setting,  "pure"  samples  from  the 
populations  in  question  are  available,  and  our  desire  is  to  test  a  new  observation  as  an  outlier  from 
these  populations.  We  will  refer  to  this  testing  procedure  as  outlier  testing  throughout  the  report. 

The  classical  method  for  outlier  detection  of  the  type  we  are  addressing  is  the  likelihood 
ratio  test  (Wilks  (1963),  Caroni  and  Prescott  (1992)),  usually  under  the  normality  assumption  for 
the  multivariate  distributions  of  the  training  sample  population  and  the  outlier  population,  and 
under  the  assumption  of  equal  covariance  of  the  two  populations  under  the  alternative  hypothesis. 
The  resulting  test  is  essentially  the  Hotelling's  test  (see  Anderson  (1984)).  In  our  current 
problem,  because  of  the  fact  that  there  is  not  a  single  multivariate  normal  population  associated 
with  the  training  sample,  these  assumptions  are  not  satisfied.  Thus,  a  direct  application  of  the 
standard  likelihood  ratio  test  does  not  seem  possible.  In  spite  of  the  importance  of  this  problem, 
to  our  knowledge  little  progress  has  been  made  in  the  literature.  Baek  et  al.  (1992)  recently 
considered  the  outlier  testing  in  the  seismic  setting  discussed  here  but  in  the  special  case  in  which 
seismic  events  are  tested  as  outliers  from  a  single  population,  usually  earthquakes.  Baek  et  al. 
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(1992)  used  a  bootstrap  approach  to  ascertain  the  distribution  of  the  likelihood  ratio  when  the 
multivariate  distribution  associated  with  the  training  sample  has  both  continuous  components  and 
discrete  components  that  have  a  finite  number  of  possible  outcomes.  Some  assumptions,  such  as 
equality  of  covariances,  are  imposed  to  link  the  training  sample  population  and  the  outlier 
population.  It  is  possible  to  apply  the  test  of  Baek  et  al.  sequentially  to  each  training  sample 
population,  but  this  can  be  cumbersome,  e  g.  the  training  sample  populations  often  have  different 
covariance  structures.  Furthermore,  this  procedure  would  result  in  substantial  loss  of  power. 

In  this  report  we  consider  an  approach  to  the  practical  problem  at  hand  by  considering 
the  combined  population  of  seismic  events  ot  mining  blasts  and  earthquakes  as  a  mixture  of  two 
populations.  We  propose  a  simple  modified  likelihood  ratio  test  using  bootstrap  resampling  that 
appears  to  perform  well  in  this  setting.  The  methodology  is  presented  in  Section  2  for  testing 
outliers  from  a  mixture  population  consisting  of  m  components.  Some  numerical  procedures  are 
addressed,  including  the  use  of  the  bootstrap  for  approximating  the  distribution  of  the  test  statistic 
in  Section  3.  We  also  describe  how  the  intensive  computing  time  required  for  the  bootstrap 
resampling  can  be  reduced  without  loss  of  accuracy  when  the  training  sample  size  is  relatively 
large.  Section  4  provides  the  results  of  empirical  studies.  Some  concluding  remarks  are  given  in 

Section  5. 

2.  The  Methodology 

Suppose  we  have  a  mixture  distribution  Hof  m  populations,  Hi,  i  =  1,...,  m.  In  the 
nuclear  testing  example  mentioned  above,  m  =  2  for  mining  explosions  and  earthquakes.  Let  d 
be  the  dimension  of  the  variables  from  the  mixed  population  11,  and  for  clarity  in  the  presentation 
assume  all  the  distributions  are  continuous.  Note  that  extensions  to  discrete  or  mixed  cases  are 
mainly  a  matter  of  notational  adjustments.  The  density  of  the  mixture  distribution  is 

m 

f{x-,  e)  ='^Pigi{x;  9i),  (1) 

i=i 
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where  Pt  >  0  are  mixing  proportions  with  ^  Pt  =  1,  Pi  are  the  densities  of  flj,  di  are  unknown 

i=l 

parameter  vectors,  9  =  (pi,  ■■•,'Pm^  ®  ~  In  the  nuclear  monitoring 

scenario,  we  wish  to  test  whether  a  new  seismic  event  is  an  outlier  to  the  mixture  of  earthquakes 
and  mining  explosions.  More  generally,  we  wish  to  be  able  to  test  whether  a  new  observation  is 
an  outlier  from  the  mixed  population  IT. 

Assume  that  we  have  a  random  training  sample  of  size  n  from  the  mixture  population 

x^, 6  n , 

and  that  we  are  able  to  identify  the  associated  source  population  for  ni  <n  members  of  the 
training  sample.  For  convenience,  let 

eUi,  fori  =  l,...,m,  (2) 

where  0  =  fco  <  fci  <  ...  <  =  n^,  i.e.,n,  =  k,  -  fc,_i  (normally  >  10)  data  points  are 

identified  to  be  from  fl^.  Additionally,  we  allow  for  the  possibility  that  the  training  sample 
contains  nu  unlabeled  observations  from  the  mixture.  In  the  notation  of  Redner  and  Walker 
(1984)  we  assume  the  sample  .X^i, ...,  is  of  Type  4,  i.e.  the  training  sample  consists  of  labeled 
and  unlabeled  observations.  The  associated  n^'s,  i  =  1,  ,  m  are  random  variables  following  a 

multinomial  distribution,  and  they  contain  information  about  the  mixing  proportions.  In  this 
notation,  n  =  nL+nu.  If  in  fact  nu  =  0,  then  the  training  sample  consists  of  only  labeled 
observations  and  is  a  sample  of  Type  3  using  the  Redner  and  Walker  notation.  Now  a  new 
observation  Xn+i  is  obtained.  Given  (2)  we  want  to  test  the  following  hypotheses: 

Ho  :  Xn+i  €  n 

vs.  (3) 

Fi: 

The  classical  likelihood  ratio  test  statistic  is  the  ratio  of  the  maximized  likelihood  functions 
under  Hq  and  Hi.  Under  Hq  the  sample  is  of  Redner  and  Walker  Type  4,  i.e.  we  assume  that 


17 


are  as  before  while  is  unlabeled  but  from  the  same  mixture  distribution  as 

Xi,...,X  That  is,  we  assume  that  all  n  +  1  observations  are  from  the  mixture  distribution 
assumed  under  Hq  with  ni  of  these  labeled  and  nu  +  1  unlabeled.  The  likelihood  function  under 
Ho  is 


Lo{0)  = 


nil 


n  n  i  n 

'  ' 3=n£+l  ' 


) 


Let  h{x\  a)  be  the  density  associated  with  the  outlier  population  from  which  ^n+i  sampled, 
where  a  is  an  unknown  parameter  vector.  Then  the  likelihood  function  under  Hi  is 


Li(0,a)  = 


nil 


’mi:,  \  "  \ 

n  n 


(4) 


5=71^4-1 


Difficulties  arise  when  maximizing  Li  since  there  is  only  a  single  observation  from  the  outlier 
population  so  that  generally  no  suitable  MLE  is  possible  for  a,  unless  a  is  assumed  to  directly  link 
to  9.  Any  such  linkage  assumption  is  quite  questionable  since  we  now  have  m  individual 
populations  that  make  up  the  mixture  distribution.  Furthermore,  with  only  one  observation  it  is 
impossible  to  do  any  model  checking  of  h{x;  q).  To  overcome  these  difficulties  and  to  observe 
the  fact  that  little  information  is  known  about  the  outlier  population  from  which  Xj  is  sampled, 
we  simply  use  a  constant  density  h{x)  =  c  over  its  practical  (finite)  support.  Moreover,  the 
constant  density  is  also  assumed  in  the  bootstrap  procedure  described  below.  Thus,  dropping  me 
constant  from  the  likelihood  ratio  test  statistic  will  not  affect  any  test  conclusions.  Therefore  we 
let 


L  m  = 


nil 


„  I  „  I  l  11  11 


5=:nL  +  l 


which  is  the  likelihood  based  on  the  sample  Xu...,X,  from  the  mixture.  We  define  a  simple 
modified  likelihood  ratio  test  statistic 
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w  = 


(5) 


sup  Lo{6) 
gee 

sup  L  i(0)  ’ 

OeQ 

where  ©  is  the  entire  parameter  space.  It  is  easily  seen  that  the  departure  of  Xn+i  from  /  will 

reduce  sup Lo{6)  making  W  small.  Hence  the  rejection  region  is  of  the  form  W  <  Wa  for  some 
s&e 

Wa  picked  to  provide  a  level  a  test.  Since  the  null  distribution  ofW  has  no  known  closed  form, 
we  suggest  the  use  of  the  parametric  bootstrap  method  to  approximate  it,  as  shown  in  the  next 
section.  Based  on  the  discussion  here  the  use  of  W  seems  to  be  a  reasonable  approach,  and  in 
Section  4  we  demonstrate  that  W  performs  well  under  all  the  simulation  scenarios  considered. 

Concluding  this  section,  we  point  out  that  asymptotically  W  ^  6„ ),  as  n  — >  oo, 

where  is  the  MLE  using  the  training  sample  only.  See  the  Appendix  for  the  proof  Moreover, 
the  bootstrap-one  method  described  in  the  next  section  is  essentially  equivalent  to  using  this 
asymptotic  result. 

3.  The  Bootstrap  and  Other  Computational  Procedures 

In  this  section  we  discuss  numerical  issues  associated  with  the  test  procedure  described  in 
Section  3.  It  should  be  noted  that  often  both  the  numerator  and  denominator  oiW  in  (5)  may  be 
difficult  to  obtain  since  the  individual  densities  are  mixture  distributions.  Recall  also  that  for  the 
numerator  we  assume  that  X\,...,Xn  can  be  identiiied  with  their  component  population,  but 
Xn+i  is  only  known  to  be  from  the  mixture,  not  the  exact  component.  However,  if  we  consider 

the  setting  of  multivariate  normality  for  each  component,  i.e., 

gi(x-di)^N{ui,Ei),  (6) 

and  thus  /(x;  6)  is  a  mixture  of  m  multivariate  normal  distributions,  a  numerical  iteration 
algorithm  based  on  the  EM  algorithm  has  been  developed  by  Redner  and  Walker  (1984),  for 
maximizing  Lo{e).  They  extended  Hosmer's  (1973)  algorithm  for  the  case  of  two  univariate 

normal  components  to  the  multivariate  normal  components  setting,  and  in  our  simulation  studies, 

we  have  adapted  their  method.  Note  that  with  (6),  supL  i(0)  is  easily  obtained.  Using  the 

gee 
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resulting  estimator  \  as  an  initial  value  in  the  numerator,  it  only  takes  at  most  a  few  steps  to 
obtain  convergence. 

We  now  turn  to  bootstrapping  the  null  distribution  of  W.  We  will  employ  the  parametric 
bootstrap  based  on  the  training  sample  The  following  algorithm  is  used  which 

mimics  the  original  sampling  plan. 


Step  1:  Use  (2)  to  obtain  (pi,  Ui,  ti)  for  i  =  1,  ...,m  . 

Step  2;  For  each  integer  6,  6  =  1, ...,  B,  draw  a  sample  of  size  n.  from  the  multinomial 
distribution  with  p  =  (Pi,  ••• ,  observe  the  frequencies  n  ,  n.,^ ,  ... , 

and  where  Additionally,  we  draw  a  sample 

of  size  nu  from  the  same  multinomial  distribution  resulting  in  frequencies  n  , 
n./^. ,  ... ,  and  where  r.  n./^.  +  •  •  •  -r  =  n^. 

Step  3;  Draw  samples  of  size  T^^^and  from  N{ui,  Hi)  for  i  —  1, m.  The  ni 

observations  associated  with  frequencies  n-^,  i  =  1,  m  are  treated  as  labeled 
samples  in  the  analysis,  while  the  nu  observations  corresponding  to  n\y, 
j  =  1,  ...,  m  are  treated  as  unlabeled  observations.  These  resampled  data  are 
used  to  compute  the  test  statistic  in  (5).  This  test  statistic  is 


denoted  by 

Step  4:  Draw  a  new,  (n  +  l)st,  observation  from  the  empirical  mixture  by  randomly 

selecting  a  single  observation  from  the  multinomial  distribution  in  Step  2.  This 
multinomial  will  essentially  select  a  component  i  between  1  and  m,  and  we 
generate  an  observation  from  the  associated  N{ui,  2t)  distribution. 

Step  5:  Repeat  Steps  2  to  4  J5  times  {b  =  l,...,B).  Then  define  Wa  to  be  the 
(lOOa)th  percentile  of  all  W;.  Specifically,  if  a  =  j/{B  +  1),  then  is 
the  j’th  smallest  value  of  {W I }  j  (see  McLachlan,  1987).  Statistical  decisions 

can  then  be  made. 
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Notice  that  when  n  is  large  the  bootstrap  scheme  may  require  considerable  computing 
time.  However,  when  nj  are  not  very  small,  this  computational  burden  can  be  avoided  by 
employing  an  approximate  bootstrap  scheme,  called  bootstrap-one.  This  technique  uses  the 
original  training  sample  in  Steps  2  and  3  for  all  6  =  1, B.  It  effectively  eliminates  these  two 
steps  and  many  calculations  in  obtaining  Wa- 

The  bootstrap-one  method  conceptually  approximates  the  conditional  distribution  of  W 
given  When  all  are  relatively  large,  the  conditioning  effect  is  minimal.  The 

accuracy  and  advantages  of  the  bootstrap-one  method  are  among  the  things  studied  in  simulations 
which  are  discussed  in  the  next  section. 


4.  Empirical  Studies 

In  this  section  we  report  some  results  of  a  simulation  study  to  illustrate  the  performances 
of  the  new  methods.  In  these  simulations  we  focus  on  the  case  in  which  all  training  sample 
observations  are  labeled,  i.e.  nu  =  0. 

Example  1.  In  this  example,  we  choose  m  =  1,  d  =  2,  and  n  =  40so  that  the  training  sample 
is  from  a  bivariate  N(p,  S'),  where 


were  used.  Obviously,  in  this  case  since  there  is  only  one  component  in  the  '  mixture  ,  all 
observations  in  the  training  sample  can  be  labeled,  i.e.  nu  =  0.  The  reason  for  choosing  m  =  1  is 
that  in  this  case  it  is  easy  to  apply  the  standard  likelihood  ratio  test  assuming  that  the  outlier 
population  is  normal  with  the  same  covariance  E.  In  this  case,  there  is  a  single  training  sample  of 
size  71  and  an  observation  be  tested  as  an  outlier.  Baek  et  al.  (1992)  discusses  the 

generalized  likelihood  ratio  test  in  this  setting.  In  particular,  the  likelihood  ratio  statistic  is  given 

by 
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(8) 


sup  Loid) 

\  _  ^  €  Q _ 

sup  L\{9,a) 
e&Q 

where  L\{6.,  cx)  is  given  in  (4)  and  a  is  related  to  0  in  a  certain  way. 

Specifically,  h  is  the  multivariate  normal  density  associated  with  observation  Xn+i  and 
a  =  js  estimated  by  taking  p>2  ~  -^n+iand  taking  E  to  be  the  MLE  obtained  from  the 

training  sample.  Under  the  normality  assumption  in  this  example,  the  test  statistic  in  (8)  is  known 
to  be  distributed  as  Hotelling's  (e.g.  Anderson,  1984).  Baek  et  al.  (1992)  considered  the 
likelihood  ratio  in  (8),  where  the  multivariate  random  variables  could  be  composed  of  both 
continuous  and  discrete  components.  They  approximated  the  distribution  of  A  in  this  case  using 
the  bootstrap  procedure  described  here.  They  applied  the  bootstrap  procedure  to  the  special  case 
in  which  the  distributions  were  multivariate  normal  and  approximated  the  distribution  of  A  using 
the  bootstrap  procedure.  Simulations  have  shown  that  the  power  of  the  test  based  on  the 
bootstrap  is  very  similar  to  that  obtained  based  on  Hotelling's  T-  in  the  multivariate  normal  case. 
In  this  report  all  tests  are  based  on  the  use  of  bootstrap  resampling  to  approximate  the  distribution 
of  the  test  statistic.  The  test  based  on  (8)  will  be  called  the  "standard"  likelihood  ratio  test. 

Instead  of  including  Lx{6,  a)  in  the  denominator  of  (8)  in  this  multivariate  normal  setting, 
we  could  have  used  the  test  statistic  given  in  (5)  which  is  based  on  the  use  of  a  constant  density 
h{x)  =  c  for  over  its  support.  The  test  statistic  using  (5)  will  be  termed  the  "modified"  likelihood 
ratio  test.  For  each  of  these  tests,  whenever  we  approximate  the  distribution  of  the  test  statistic 
by  a  full  bootstrapping  of  n  + 1  observations,  we  will  refer  to  this  as  the  "full"  procedure. 
Alternatively,  in  each  case  we  also  consider  the  use  of  the  bootstrap-one  technique.  In  Table  1  we 
denote  them  as  "full"  and  "one"  respectively. 

Table  1  summarizes  the  simulation  results  of  the  two  tests.  One  thousand  replications 
were  used  for  each  entry  and  we  used  B  =  499.  The  power  was  obtained  with  (2)' 
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^  ^  ^  yjas  the  outlier  population.  We  have  experimented  with  other  covariance 

values,  including  that  in  (7),  and  similar  power  patterns  were  observed. 

First,  we  compare  the  standard  and  modified  tests  using  fiill  bootstrapping.  In  Table  1,  it 
can  be  seen  that  the  significance  levels  for  both  tests  are  close  to  the  nominal  level  of  o;  .05 
with  the  modified  tests  having  slightly  larger  levels.  Additionally,  the  powers  of  the  two  tests  are 
similar  with  the  modified  tests  having  somewhat  larger  power.  Thus,  the  use  ofW  in  (5),  which 
appropriately  reflects  our  ignorance  about  the  outlier  population,  performs  as  well  as  the  full 
likelihood  ratio. 

Next,  comparing  "One"  columns  to  "Full"  columns,  we  observe  that  the  bootstrap-one  has 
significance  levels  that  are  artificially  high  for  smaller  sample  sizes.  However,  for  large  n  (say  > 
100)  the  significance  levels  are  of  appropriate  size.  For  these  larger  sample  sizes  the  bootstrap- 
one  procedure  tended  to  have  higher  power  than  obtained  using  full  bootstrapping.  Based  on 
these  results  and  the  computational  burden  associated  with  large  n  suggests  that  the  bootstrap- 
one  is  a  viable  alternative.  Finally,  notice  that  the  bootstrap-one  method  is  identical  for  the 
standard  and  new  tests.  In  fact,  the  identity  can  be  shown  analytically  under  normality.  However, 
the  identity  is  not  true  in  general. 


Fxamnie  2.  In  this  example 've  consider  the  use  of  the  likelihood  ratio  test  to  test  for  outliers 
from  the  mixture  model  in  (1)  with  m  =  2  and  n  =  60.  Again  we  consider  the  case  in  which 
d  =  2  and  nu  =  0,  and  specifically,  we  assume  that  the  component  densities  and  are 
multivariate  normal  densities  associated  with  a 


N 


.5 

1 


and 
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populations  respectively. 


Case  a:  Pi  =  P2  =  -S- 

We  examine  the  power  of  the  test  for  detecting  outliers  from 


N 


1  fc  —  5 
1  -  (A;  -  5) 


population  where  A;  =  1, 9.  In  Figure  1(a)  we  show  data  from  a  mixture  of  two  populations 
with  Pi  =  P2  =  0-5  along  with  5  outliers.  In  Figure  1(b)  we  show  the  same  data  with  individual 
observations  labeled  with  regard  to  the  associated  component  population  or  outlier  population. 
The  outliers  are  indicated  by  solid  dots.  In  Figure  1(c)  we  again  show  the  labeled  data  along  with 
contours  of  the  mixture  population.  Finally,  in  Figure  1(d)  we  show  means  and  contours  of  the 
two  component  populations  and  of  the  outlier  population.  In  Figure  2  we  show  the  contours  of 
the  mixture  components  as  in  Figure  1(d)  along  with  the  outlier  means  (l+fc  5,  1  (Ai  5))  , 
k  =  1,...,9.  Also  in  this  figure  we  show  the  contour  of  the  outlier  population  for  the  case 
k  =  2,  i.e.  the  mean  is  (  -  2,  4)'.  In  Table  2(a)  n  =  60  is  used  and  the  nominal  level  is  a  =  0.05. 
As  can  be  seen,  the  significance  level  is  close  to  the  nominal  level.  Whenever  the  outlier 
population  is  well  separated  from  the  component  distributions  of  the  mixture  we  have  good  power 
while  as  would  be  expected  the  power  lowers  dramatically  for  k  near  5.  The  true  powers  for 
k  =  1,2,3,  and 4  are  the  same  as  those  for  k  =  9,8, 7 and 6  respectively,  due  to  symmetry.  The 
empirical  results  appear  to  verify  this  fact. 


Case  b:  p\  =  0.25  and  P2  =  0.75. 

In  this  case  we  consider  the  same  scenario  as  Case  a  but  with  pi  =  0.25  and  p2  =  0.75. 

In  Figure  3  we  show  the  plots  corresponding  to  Figure  1  for  the  case  in  which  p\  =  0.25  and 
P2  =  0.75,  and  in  Table  2(b)  we  show  results  corresponding  to  those  in  Table  2(a)  for  this  case. 
Again,  we  see  that  the  significance  levels  are  accurate  and  that  powers  are  similar  to  those  in 
Table  2(a).  It  should  be  noted  that  due  to  smaller  p\  here,  there  was  a  very  small  fraction 
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(  <  0.2%)  of  all  bootstrap  simulation  replications  that  did  not  converge  with  our  current 
program.  This  problem  seems  to  become  more  serious  when  smaller  values  of  n  are  used.  In  our 
analysis  we  simply  skip  any  bootstrap  realization  for  which  convergence  was  not  obtained  and 
generate  another  one.  Another  possible  approach  would  be  to  use  the  starting  values  as  final 
estimates  for  these  bootstrap  replications. 

5.  Concluding  Remarks 

In  this  report  we  have  proposed  a  simple  modified  likelihood  ratio  test  for  multivariate 
outlier  detections.  This  new  test  is  not  only  good  for  use  in  general  outlier  detection  problems, 
but  especially  applicable  when  the  training  sample  population  is  a  mixture  of  several  populations. 
In  the  new  test  no  assumption  is  necessary  for  the  covariance  structure  or  any  other  moments  of 
the  outlier  population,  and  in  fact  no  parametric  modeling  is  required  for  the  outlier  population. 
Furthermore,  although  with  weaker  assumptions  it  is  more  powerful  than  the  standard  likelihood 
ratio  test  in  the  simpler  non  mixture  situation  in  which  the  standard  test  applies. 

We  have  also  investigated  bootstrapping  the  distributions  of  the  test  statistics.  The 
computationally  intensive  resampling  method  seems  to  be  quite  effective.  When  the  training 
sample  size  is  large,  we  have  also  suggested  the  bootstrap-one  method,  which  significantly 
reduces  the  computing  time  and  seems  to  have  somewhat  more  power. 

It  should  be  noted  that  the  procedure  could  be  extended  to  cover  the  case  in  which  all  of 
the  training  sample  observations  are  unlabeled.  This,  however,  will  require  dealing  with  issues 
such  as  the  use  of  appropriate  starting  values  and  is  not  considered  here. 
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APPENDIX 


In  this  appendix,  we  show  that  W  ^  /(Xn+i;  On),  as  n  oo,  where  W  is  given  in  (5).  Let 


^o{0)  —  Ii\{Lq{9)}  =  £  i(^)  +  ^/(Xn+i;  0)}.  (-^1) 

Suppose  On  and  0n+\  satisfy  the  conditions  that  L  ](0n)  =  sup  L  \(6)  and  Lo(^n+i)  = 

^  €  © 

sup  Lo(^),  respectively.  Then  /?()(?n+i)  =  0  and  7  [  (?„)  =  0.  Thus,  from 
OeO 

foiOn+i)  =  ^o(^n)  +<?o(^r.)(^r,-i  "  +  Smaller  terms 

=  |[ln{/(X„+i;  0)}]  .  +e'^{6n)i9n^i  -?„)  + smaller  terms, 

we  have 

=  0,{i),  (A2) 

since  f;(?„+i)  =0,  Is  of  order  Op(n),  and  J(ln{/(A'„+,;  d)}|  ‘sOp(l)-  Now  by 

(Al)  and  (A2), 

^  r\j 

W  =  exp{£o(^n+l)  -  £  l(^n)} 

=  eXp{£o(^n)  +  £oi0n)i0n+l  "  ^n)  +  “  £  l(^n)} 

=  exp[ln{/(X„4-i:^n)}]+Op(i) 

=  /(x:„+i:?n)  +  Op(i), 

completing  the  proof 
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Table  1.  Comparisons  of  significant  level  and  power  of  the  standard  likelihood  ratio  test 
and  modified  likelihood  ratio  test,  using  two  (Full  and  One)  bootstrap  approximations. 


n 

Significance  Level 

Power 

Standard 

Modified 

Standard 

Mo 

dined 

Full 

One 

Full 

One 

Full 

One 

Full 

One 

15 

.048 

.118 

.065 

.118 

.522 

.729 

.568 

.729 

20 

.048 

.100 

.063 

.100 

.541 

.709 

.588 

.709 

25 

.036 

.081 

.048 

.081 

.563 

.704 

.601 

.704 

30 

.047 

.084 

.051 

.084 

.579 

.718 

.609 

.718 

50 

.046 

.064 

.050 

.064 

.626 

.696 

.645 

.696 

100 

.056 

.059  • 

.057 

.059 

.646 

.677 

.657 

.677 

150 

.059 

.057 

.061 

.057 

.655 

.703 

.665 

.703 

s.e. 

.007 

.015 

Table  2a.  Significance  level  and  power  of  new  test  in  Example  2; 

p^=z  p^  =  0.5,  n  =  60,  J5  =  199,  1000  replications 


Level 

.050  (s.e.  .007) 

k 

1^ 

2 

3 

4 

5 

6 

7 

8 

9 

Power 

1.000 

.984 

.754 

.226 

.031 

.231 

.767 

.980 

1.000 

S.e. 

.001  n 

.004 

.014 

.013 

.006 

.013 

.014 

.004 

.001 

Table  2b.  Significance  level  and  power  of  new  test  in  Example  2; 
Pi  =  0.25,  P2  =  0.75,  n  =  60,  B  =  199,  1000  replications 


Level 

.055  (s.e.  .007) 

k 

1 

2 

3 

4 

5 

6 

7 

8 

9 

Power 

.999 

.970 

.709 

.242 

.701 

.972 

.999 

S.e. 

.001 

.005 

.014 

.001 

28 


-  Mixture  distributions  for  Example  2a  where/ (x)  =  5.iV(/xi  X)i)  +  .bN (/i2.X)2) 
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Figure  2  -  Mixture  distributions  for  Example  2a  showing  means  of  outlier  distributions  in  the  simulations 
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3.0  Positive  Orthant  Alternatives 


In  previous  applications  of  outlier  tests  a  symmetric  alternative  region  has  been 
assumed.  This  is,  in  general,  not  the  best  approach;  and,  it  will  clearly  sacrifice 
power,  i.e.,  reduce  the  detection  probability.  Since  in  most  cases  some  information 
is  known  regarding  the  direction  of  the  selected  features  relative  to  a  nuclear 
event,  an  alternative  region  can  be  defined  which  takes  advantage  of  this 
information.  The  resulting  test  would  then  be  significantly  more  powerful.  In  this 
section  we  introduce  some  methodology  for  addressing  this  difficult  problem. 


3.1  Background 

Since  it  is  quite  possible  that  a  nuclear  event  is  likely  to  occur  in  a  region  where  no 
previous  nuclear  activity  has  transpired,  a  primary  concern  is  to  detect  an  unusual 
event.  Let  the  /7-dimensional  variable  V  =  (Vi,V2,...,Vp)  characterize  the 
occurrence  of  an  event.  The  F/  will  be  referred  to  as  "script”  variables  or 
"features."  Suppose  that  a  training  sample  {Vi}”^  is  available  from  past  events 

and  a  new  observation,  V„+i,  is  obtained  which  must  be  classified  as  to  whether  or 
not  it  belongs  to  the  same  population  as  the  training  sample.  Baek,  et.  al.  (1992) 
constructed  a  test  for  this  situation  by  appl5dng  the  parametric  bootstrap  (Efron 
(1979))  to  generalized  likelihood  ratios.  This  approach,  the  bootstrapped 
generalized  likelihood  ratio  test  (BGLRT),  was  shown  to  be  applicable  when  the 
observations  are  from  either  mixtures  of  discrete  and  normally  distributed 
continuous  variables  or  from  a  multivariate  normal  distribution.  Miller,  et.  al. 
(1993)  also  showed  that  the  BGLRT  is  also  of  utility  when  data  are  missing. 


In  the  case  where  the  observations  are  from  a  /(-dimensional  normal  population, 
the  likelihood  ratio  test  leads  to  the  two-sample  Hotelling  statistic  and  an  F 
statistic  with  p  and  n  -  p  degrees  of  freedom.  Bootstrapping  this  statistic  amounts 
to  empirically  determining  the  critical  points  of  the  distribution  of  the  statistic 
under  the  null  hypothesis  that  the  observation  to  be  classified  is  a  member  of  the 
population  of  interest.  Through  Monte  Carlo  investigation,  the  performance  of  the 
and  the  bootstrapped  (either  parametric  or  nonparametric)  P-  can  be  found  to 
be  effectively  equivalent.  Also  at  least  in  the  one  dimensional  case,  where  exact 
tests  are  available  for  comparison,  the  nonparametric  bootstrapped  P  can  be 
found  to  be  quite  robust  to  the  effects  of  non-normality  and  to  have  reasonable 
power  when  compared  to  the  appropriate  uniformly  most  powerful  test. 


This  report  will  examine  the  case  where  the  observations  are  from  a  p-dimensional 
normal  distribution.  In  this  context,  the  P  statistic  is  testing  the  null  hypothesis 
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that  the  mean  of  the  population  associated  with  the  training  sample  is  equal  to  the 
mean  of  the  population  associated  with  the  new  observation  versus  the  alternative 
that  at  least  one  of  the  elements  of  the  mean  is  different. 


In  the  situation  of  monitoring  nuclear  proliferation,  it  is  possible  that  at  least  the 
order  relationships  of  the  individual  features  of  the  two  populations  may  be  known 
a  priori.  For  example,  if  the  observations  were  from  a  single  monitoring  station 
and  were  three  dimensional,  consisting  of  the  frequency  and  the  magnitude  of  the 
real  and  complex  poles  of  an  AR(3)  model,  then  through  analysis  of  existing  data  it 
is  known  that  the  complex  pole  of  an  explosion  has  a  larger  magnitude  than  the 
complex  pole  of  an  earthquake.  Further,  the  frequency  of  the  complex  pole  for  an 
explosion  will  be  less  than  that  of  an  earthquake.  Thus  in  testing  the  hypothesis 
that  a  given  event  is  an  earthquake  versus  the  alternative  hypothesis  that  it  is  an 
explosion,  one  would  want  the  alternative  region  focused  on  large  poles  and  small 
frequencies  rather  than  simply  "extreme  poles"  and  "extreme  frequencies."  Such  a 
region  is  called  a  "focused"  alternative  region. 


It  is  desirable  to  take  advantage  of  this  before-hand  knowledge  by  developing  a 
test  of  hypothesis  which  is  more  focused  than  the  standard  two-sample  . 
Perlman  (1969)  developed  theory  for  a  single-sample  P-  with  a  one-sided 
alternative.  In  the  single-sample  problem,  the  null  hypothesis  of  the  standard  P  is 
that  the  mean  of  a  set  of  random  observations  is  equal  to  zero  versus  the 
alternative  that  at  least  one  of  the  elements  of  the  mean  is  not  equal  to  zero.  His 
work  was  taken  by  Tang  (1994)  and  applied  to  the  alternative  hypothesis  that  the 
sum  of  the  elements  of  the  mean  is  non-negative,  a  so-called  half  space  alternative. 


This  report  will  expand  the  single-sample  P  with  one-sided  alternative  to  the 
situation  where  the  alternative  hypothesis  is  that  each  element  of  the  mean  is 
positive.  These  results  will  then  be  extended  to  the  special  two-sample  P 
situation  of  interest  in  monitoring  nuclear  proliferation,  in  which  the  second  sample 
consists  of  a  single  observation.  Some  Monte  Carlo  results  will  then  be  presented 
to  illustrate  the  gains  in  power  that  can  be  achieved  as  contrasted  with  the  standard 
two-sample  P.  No  real  data  will  be  processed  for  this  report.  However, 
examples  using  real  data  will  be  included  in  the  next  report. 


3.2  The  Single-sample  P  with  a  Half  Space  Alternative 


A  half  space  is  a  set  of  the  form  {v  |  v'u  >  0}  for  some  fixed  vector  u.  A  set  is 
one-sided  if  it  is  contained  in  the  interior  of  a  half  space.  A  cone  means  a 
positively  homogeneous,  closed,  and  one-sided  set.  For  the  one-sample  situation. 
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it  is  assumed  that  there  is  a  sample  of  size  n  from  a  /7-dimensional  normal  with 
unknown  mean  fx  and  covariance  E.  The  problem  is  to  test  the  null  hypothesis 
that  /i  =  0  versus  the  alternative  hypothesis  that  /i  €  C,  where  C  denotes  a  cone. 


Perlman  derived  the  likelihood  ratio  statistic  to  be 

j/(x,yXO  =  ||y||^  (1  +  liy  -  (') 


where  x  denotes  rfi  times  the  sample  mean,  A  is  (w  -  1)  times  the  sample 
covariance  matrix,  and  y  is  the  vector  in  C  which  is  closest  to  x  in  terms  of  the 
Mahalanobis  distance; 


lly-  x||5=(x-y)M-Hx-y).  (2) 

Tang  proved  that  if  C  contains  an  open  set,  and  if  is  any  half  space  containing 
C,  then  the  test  using  the  statistic  U(x,y4JI*)  is  uniformly  more  powerful  than  the 
statistic  U(x,y^,C).  He  used  a  result  of  Perlman  to  calculate  critical  points  for  the 
statistic  U(x,y^,H^)  and  compared,  via  Monte  Carlo  results,  the  power  of 
U{x,y,A,H*)  versus  the  standard  7^. 


3.3  Extension  to  the  Positive  Orthant  Alternative  Hypothesis 


It  is  not  possible  to  analytically  calculate  critical  points  for  the  test  statistic 
U(x,y,A,Q^)  in  which  C  is  the  positive  orthant,  Q^,  where  all  the  elements  of  the 
mean  are  non-negative.  However,  bootstrapping  provides  a  means  of  estimating 
the  critical  points  of  the  statistic  under  the  null  hypothesis  that  the  mean  is  zero. 
The  process  for  conducting  a  test  of  size  a  using  nonparametric  bootstrapping  is 
described  in  the  following  steps. 

Procedure  for  the  Bootstrap  Test  of  the  Null  Hypothesis  that  the  Mean  of  a 
Sample  is  Zero  Versus  the  Alternative  that  the  Mean  is  in  the  Positive  Orthant 

1.  Compute  the  Uix,m,A,Q*)  statistic  using  the  original  set  of 
observations. 

2.  Subtract  the  mean  of  the  sample  from  each  of  the  observations,  leaving 
the  n  residuals. 
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3.  Draw  with  replacement  a  random  sample  of  size  n  from  the  residuals 
(for  which  the  null  hypothesis  holds  conditionally). 

4.  For  that  sample,  compute  the  statistic  where  the  * 

denotes  the  same  quantities  in  (1)  evaluated  on  the  (re)sampled  residuals. 

5.  Repeat  steps  3  and  4  a  total  of  B  times  and  save  the  values  of  U*. 

6.  Calculate  the  (1  -  a)th  quantile  of  these  B  realizations,  ui^,  the 
bootstrap  estimate  of  the  critical  value. 

7.  If  the  statistic  of  step  1  equals  or  exceeds  ui.q,  the  null  h5q)othesis  is 
rejected  in  favor  of  the  alternative  that  eQ*. 


The  main  difficulty  with  carrying  out  the  preceding  steps  is  in  the  calculation  of  the 
statistic  given  by  (1).  If  all  the  elements  of  the  vector  x  are  non-negative,  then  y  = 
X,  the  denominator  of  (1)  is  unity,  and  the  calculation  involved  is  the  same  as  that 
associated  with  the  standard  P  statistic.  If,  however,  some  or  all  of  the  elements 
of  X  are  negative,  then  it  remains  to  find  the  vector  y  in  which  is  closest  to  x  in 
terms  of  the  Mahalanobis  distance.  To  develop  a  "feel"  for  the  process  the 
methodology  for  the  two  dimensional  case  will  first  be  described  and  then  a 
general  procedure  will  be  stated  and  summarized  in  a  series  of  steps. 


3.3.1  Calculation  of  f/(x,yy4,^)  in  Two  Dimensions 


In  the  case  where  p  =  2,  let  the  axis  be  the  abscissa  and  the  p2  axis  be  the 
ordinate.  The  positive  quadrant,  where  values  of  /Jiand  p2  are  positive,  is  labeled 
as  quadrant  1.  Quadrants  2  -  4  are  found  by  moving  counter-clockwise.  So,  if  the 
first  element  of  x  is  positive  and  the  second  element  is  negative,  then  x  is  located  in 
quadrant  4,  below  the  positive  p\  axis.  This  situation  is  depicted  in  Figure  1, 
which  shows  x  at  the  center  of  an  ellipse  which  is  touching  the  axis  at  the  point 
yi  =  (yi,0).  The  shape  of  the  ellipse  is  governed  by  the  elements  of  A'^.  The 
vector  yi  is  the  closest  point  to  x  on  the  axis.  It  is  the  point  which  minimizes 
the  distance; 


^(yi)  =  (x-(yi,0)7^-'(x-(yi,0)'). 


(3) 


If  the  elements  of  the  A'^  matrix  are  denoted  by  Cy,  i,j  =  1,2,  then  the  value  of 
which  minimizes  (3)  can  be  found  to  be: 
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y\={cnX2+cuX\)lcxi. 


(4) 


In  the  figure,  (yi,0)  is  located  on  the  non-negative  part  of  the  axis,  and  thus  is  a 
candidate  to  be  the  closest  point  in  Q"  to  x.  In  general,  if^i  were  found  to  be 
negative,  then  the  point  (0,0)  would  be  the  point  in  closest  to  x  and  on  the  y,i 
axis  .  There  is  also  a  point  y2  =  (0j^2)  which  is  the  closest  point  to  x  on  the  ^2 
axis.  It  is  the  point  which  minimizes  the  distance; 


qiyi)  =  (x  -  (0xV2)')'^*‘(x  -  (0xV2)'), 


(5) 


and  the  value  of >^2  which  minimizes  (5)  is: 


T2  =  (^^12X1  +  022X2)! C22- 


(6) 


If  ^2  were  found  to  be  negative  (which  it  would  be  in  Figure  1),  then  the  point 
(0,0)  would  be  the  point  in  Q*  closest  to  x  and  on  the  ^2  axis.  The  value  of  y  that 
is  used  in  (1)  is  the  one  associated  with  the  smaller  of^(yi)  and  ^(y2);  otherwise, 
y  =  0,  if  neither  yi  or  y2  are  in  . 


In  summary,  if  x  does  not  fall  in  the  first  quadrant,  then  depending  on  the  quadrant 
in  which  it  does  fall  and  its  sample  covariance  structure,  there  may  or  may  not  be  a 
point  y  on  the  border  of  0^  other  than  the  point  (0,0),  in  which  case  U(x,yJ,Q^) 
is  equal  to  zero.  The  algorithmic  approach  for  calculating  the  statistic  U{\,yA,Q^) 
in  two  dimensions  is  summarized  in  the  following  steps: 


Algorithm  for  Calculating  Uix.yA.O^')  in  Two  Dimensions 

1.  Set  y  =  0  and  f/min  =  ||x||^. 

2.  Using  (4),  calculate yi.  Ifyi  <  0,  go  to  step  4. 

3.  Calculate  ^(yi)  using  (3).  Set  y  =  (yi,0)'  and  <4in  =  ^(yi)- 

4.  Using  (6),  calculate  y2  If>'2  <  0,  go  to  step  6. 

5.  Calculate  ^'(^2)  using  (5).  If^(y2)  set  y  -  (0^2)'- 

6.  The  current  value  of  y  is  used  to  calculate  U(x,y^,Q*)  using  (1). 
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3.3.2  Calculation  of  C/(x,yy4,2^)  in />-Space 


In  p-space,  the  process  of  calculating  the  statistic  U(x,y,A,Q*)  is  a  generalization 
(albeit  more  complicated),  of  what  is  done  in  the  case  where  p  =  2.  Again,  if  x  is 
in  the  positive  orthant  Q^,  then  y  =  x  and  U{\,yA,Q^)  «  equal  to  the  standard  7^ 
statistic.  Otherwise,  the  process  is  to  find  one  or  more  points  located  on  the 
periphery  of  Q",  in  the  highest  possible  dimensional  subspace,  which  are 
candidates  in  the  positive  orthant,  and  then  determine  which  of  those  points  is 
closest  to  X.  This  is  accomplished  by  initially  searching  in  (p  -  1)  space,  then  (p  - 
2)  space,  and  so  forth,  until  either  a  point  is  found  or  else  it  is  determined  that 
there  is  no  positive  point  on  any  of  the  p  axes  which  is  closest  to  x. 


Denote  by  y^y  a  vector  with  the  yth  element  being  zero,  i.e.,  yg>  = 

. •••»»)'•  The  values  of  the  elements  of  y^j  are  those  which 

minimize  the  distance  in  the  following  equation: 


^(yfi))  =  (x-y6y)'^‘'(x-y(y)- 


(7) 


The  elements  of  each  yg;  are  obtained  in  the  following  manner.  As  before  the 
elements  of/I'*  are  denoted  by  Cy,  ij  =  1,2,...,/?,  and  set 


d=A'^x.  (8) 

Let  C^j)  denote  the  (p  -  1)  x  (p  -  1)  matrix  obtained  by  deleting  the  yth  row  and 
column  of  and  designate  by  the  (p  -  1)  vector  realized  when  the  yth 
element  of  d  is  deleted.  The  elements  found, 

respectively,  as  the  (p  - 1)  elements  of  the  vector 

^)=cjd(f).  (9) 


If  any  of  the  elements  of  zyj  are  negative,  then  the  corresponding  point  yy)  is  not 
in  the  positive  orthant.  If  there  is  more  than  one  yy)  in  Q^,  then  it  can  be 
determined  which  is  the  closest  to  x. 


If  there  is  no  zy)  in  which  all  of  the  elements  are  non-negative,  then  the  search 
moves  to  the  next  lower  dimension.  This  involves  defining  quantities  y^j),  Cyj), 


37 


and  A(ij),  where,  for  example,  C(ij)  is  formed  by  deleting  the  /th  andyth  rows  and 
columns  of >4'* .  Then  one  obtains  Z(ij) : 


-1 

^(ij)  ~ 


(10) 


There  are  possibilities  to  examine  before  moving  to  the  next  lower  dimension, 

if  required.  Each  time  the  dimensionality  decreases,  the  number  of  elements  of  d 
which  are  used  decreases  by  one,  and  one  more  row  and  column  of  A'^  are  not 
used.  Eventually,  if  there  is  no  point  in  which  is  found  to  be  closest  to  x,  then 
y  =  0  and  t/  =  0.  This  situation  would  require  the  maximum  of  2^-  1  searches  for 

y- 


3.4  Application  to  a  Training  Sample  and 
Classification  of  a  New  Observation 

The  extension  of  the  single-sample  test  of  hypothesis,  that  the  mean  is  zero  versus 
the  focused  alternative  that  the  mean  is  in  the  positive  orthant,  to  the  case  of 
interest  in  this  report,  requires  minor  modifications  to  the  procedure  outlined  in  the 
single-sample  case.  Let  the  /^-dimensional  variable  V  =  (Vi,V2,...,Vp)  characterize 
the  occurrence  of  an  event.  Suppose  that  a  training  sample  {V,  is  available 
from  past  events  and  a  new  observation,  V„+i,  is  obtained  which  must  be  classified 
as  to  whether  or  not  it  belongs  to  the  same  population  as  the  training  sample. 
Denote  by  V  the  mean  of  the  training  sample  observations  and  let  x  =  V„+i  -  V. 
The  counterpart  to  (1)  for  the  test  statistic  for  this  special  two-sample  case  is: 

U{x,yJ,Q^)  =  ||y||^  ((//  +  l)/n  +  ||y  -  4^’ 


where,  as  before,  y  is  the  closest  point  in  to  x  as  measured  by  (2).  Other  than 
the  different  definition  of  x  and  using  (1 1)  in  lieu  of  (1),  the  only  other  difference  is 
that  it  is  necessary  to  add  a  step  in  the  bootstrapping  test  procedure  in  which  a 
single  observation  is  drawn  in  addition  to  the  training  set.  With  regards  to  finding 
the  closest  vector  to  x,  all  the  equations  and  procedures  of  the  previous  sections 
are  applicable.  The  process  to  conduct  the  desired  test  of  hypothesis  of  size  a  is 
summarized  in  the  following  steps. 

Procedure  to  Test  the  Null  Hypothesis  that  the  Mean  of  a  New  Observation  is 
Equal  to  the  Mean  of  a  Training  Sample  Versus  the  Alternative  that  the  Mean  of 
the  New  Observation  is  in  the  Positive  Orthant 

1.  Compute  U{x,yA,Q')  using  the  original  set  of  observations  and  (1 1) 
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2.  Subtract  the  mean  of  the  training  sample  from  each  of  the  observations  in  the 
training  sample,  leaving  the  n  residuals 

3.  Obtain  a  bootstrapped  training  sample  by  drawing  with  replacement  a  random 
sample  of  size  n  from  the  residuals 

4.  Obtain  a  bootstrapped  new  observation  by  drawing  one  of  the  residuals  with 
replacement 

5.  For  those  samples,  compute  the  statistic  U(x  ^ where  the  *  denotes 
the  same  quantities  in  (1 1)  evaluated  on  the  (re)sampled  residuals 

6.  Repeat  steps  3  to  5  a  total  of  B  times  and  save  the  values  of  U* 

7.  Calculate  the  (1  -  a)th  quantile  of  the  B  realizations,  u\.a,  the  bootstrap 
estimate  of  the  critical  value 

8.  If  the  statistic  of  step  1  equals  or  exceeds  wi^,  the  null  hypothesis  is  rejected 
in  favor  of  the  alternative 

To  illustrate  the  improved  power  that  is  realized  with  the  focused  alternative 

hypothesis,  a  sequence  of  Monte  Carlo  experiments  was  conducted  in  which  the 

standard  7^  was  contrasted  to  the  focused  alternative.  In  two  dimensions, 

observations  in  a  training  sample  were  randomly  generated  wdth  parameters: 


fo\ 

'i  o' 

=  1  0  )  Si= 

0  1 

while  the  observation  V„+i  was  randomly  generated  with  the  same  covariance 
matrix  and  a  mean: 


Power  was  based  on  1,000  replications  at  each  value  of  A.  It  was  estimated  based 
on  the  proportion  of  the  1,000  iterations  in  which  Vn+i  was  found  to  be  in  the 
critical  region.  The  value  of  B,  the  number  of  bootstraps  used  in  obtaining  a 
focused  alternative  critical  point,  was  selected  to  be  499.  The  desired  significance 
level  for  the  test  was  a  =  0.05.  Figure  2  presents  the  results  for  a  training  sample 
of  30.  An  interesting  feature  of  this  figure  is  that  the  test  based  on  the  statistic 
U(x,y^,Q*)  has  virtually  no  power  for  negative  values  of  A  while  displaying 
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considerably  more  power  than  the  standard  two-sample  7^  when  A  is  in  the  first 
quadrant. 


When  the  number  of  dimensions  is  increased  to  four,  the  improvement  of 
U(x,y,A,Q^)  over  the  P  is  even  more  impressive,  as  illustrated  in  Figure  3.  To 
obtain  this  figure,  the  numbers  of  bootstraps  and  replications  remained  the  same  as 
with p  =  2,  while  the  statistical  distribution  parameters  were: 
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3.5  Conclusion 


In  monitoring  nuclear  proliferation,  it  is  possible  that  at  least  the  relative  order 
relationships  of  the  individual  features  of  two  different  populations  may  be  known 
a  priori.  In  this  report  a  test  of  hypothesis  employing  a  focused  alternative  region, 
for  taking  advantage  of  such  a  situation  in  classifying  new  observations,  is 
described.  The  methodology  to  employ  when  using  this  test  is  outlined;  and, 
representative  results  for  different  sample  sizes  and  feature  dimensions  are 
presented.  These  results  indicate  that  this  test  should  be  a  useful  tool  in  the 
classification  process. 
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Quadrant  2  Quadrant  1  (Q+) 
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Figure  1 :  Closest  Point  in  Q+  to  Point  in  Quadrant  4 
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Figure  2:  Power  Comparison  for  Two  Dimensions  (n=30) 
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